Goto

Collaborating Authors

 efficient and low-complexity algorithm


Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits

Neural Information Processing Systems

We study the online restless bandit problem, where the state of each arm evolves according to a Markov chain, and the reward of pulling an arm depends on both the pulled arm and the current state of the corresponding Markov chain. In this paper, we propose Restless-UCB, a learning policy that follows the explore-then-commit framework. In Restless-UCB, we present a novel method to construct offline instances, which only requires $O(N)$ time-complexity ($N$ is the number of arms) and is exponentially better than the complexity of existing learning policy. We also prove that Restless-UCB achieves a regret upper bound of $\tilde{O}((N+M^3)T^{2\over 3})$, where $M$ is the Markov chain state space size and $T$ is the time horizon. Compared to existing algorithms, our result eliminates the exponential factor (in $M,N$) in the regret upper bound, due to a novel exploitation of the sparsity in transitions in general restless bandit problems. As a result, our analysis technique can also be adopted to tighten the regret bounds of existing algorithms. Finally, we conduct experiments based on real-world dataset, to compare the Restless-UCB policy with state-of-the-art benchmarks. Our results show that Restless-UCB outperforms existing algorithms in regret, and significantly reduces the running time.


Review for NeurIPS paper: Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits

Neural Information Processing Systems

I must first admit that judging this paper was a fairly challenging task given the mixed opinions expressed by the reviewers, together with my own impressions after having scrutinized the manuscript in detail. The reviewers largely agree that the paper deserves credit as it tackles the challenging, relevant and (relatively) scarcely studied topic of restless bandit learning. I believe the main value of the paper is in the introduction of the birth-death Markov chain structure for arms of a restless bandit, together with the monotonicity and positive correlation assumptions on rewards and transitions. These are not unnatural assumptions, as evidenced by modeling literature on scheduling over wireless channels and queueing systems, and seem to greatly alleviate the computational complexity of a portion of the learning process. On the other hand, the reviewers are not fully convinced about the significance of the proposed algorithm and regret bound proven in the paper, given that the analysis is carried out for a highly structured ensemble of Markov decision processes.


Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits

Neural Information Processing Systems

We study the online restless bandit problem, where the state of each arm evolves according to a Markov chain, and the reward of pulling an arm depends on both the pulled arm and the current state of the corresponding Markov chain. In this paper, we propose Restless-UCB, a learning policy that follows the explore-then-commit framework. In Restless-UCB, we present a novel method to construct offline instances, which only requires O(N) time-complexity ( N is the number of arms) and is exponentially better than the complexity of existing learning policy. We also prove that Restless-UCB achieves a regret upper bound of \tilde{O}((N M 3)T {2\over 3}), where M is the Markov chain state space size and T is the time horizon. Compared to existing algorithms, our result eliminates the exponential factor (in M,N) in the regret upper bound, due to a novel exploitation of the sparsity in transitions in general restless bandit problems.